BePLi Dataset v1: Beach Plastic Litter Dataset version 1 for instance segmentation of beach plastic litter

Marine plastic pollution is a pressing global issue nowadays. To address this problem, automated image analysis techniques that can identify plastic litter are necessary for scientific research and coastal management purposes. The Beach Plastic Litter Dataset version 1 (BePLi Dataset v1) comprises 3709 original images taken in various coastal environments, along with instance-based and pixel-level annotations for all plastic litter objects visible in the images. The annotations were compiled in the Microsoft Common Objects in Context (MS COCO) format, which was partially modified from the original format. The dataset enables the development of machine-learning models for instance-level and/or pixel-wise identification of beach plastic litter. All original images in the dataset were extracted from beach litter monitoring records operated by the local government of Yamagata Prefecture in Japan. Litter images were taken in different backgrounds, such as sand beaches, rocky beaches, and tetrapods. The annotations for instance segmentation of beach plastic litter were made manually, and were given for all plastics objects, including PET bottles, containers, fishing gear, and styrene foams,all of which were categorized in a single class “plastic litter”. Technologies developed using this dataset have the potential to enable further scalability for the estimation of plastic litter volume. This would help researchers, including individuals, and the the government to monitor or analyze beach litter and the corresponding pollution levels.

Marine plastic pollution is a pressing global issue nowadays. To address this problem, automated image analysis techniques that can identify plastic litter are necessary for scientific research and coastal management purposes. The Beach Plastic Litter Dataset version 1 (BePLi Dataset v1) comprises 3709 original images taken in various coastal environments, along with instance-based and pixel-level annotations for all plastic litter objects visible in the images. The annotations were compiled in the Microsoft Common Objects in Context (MS COCO) format, which was partially modified from the original format. The dataset enables the development of machine-learning models for instance-level and/or pixel-wise identification of beach plastic litter. All original images in the dataset were extracted from beach litter monitoring records operated by the local government of Yamagata Prefecture in Japan. Litter images were taken in different backgrounds, such as sand beaches, rocky beaches, and tetrapods. The annotations for instance segmentation of beach plastic litter were made manually, and were given for all plastics objects, including PET bottles, containers, fishing gear, and styrene foams,all of which were categorized in a single class "plastic litter". Technologies developed using this dataset have the potential to enable further scalability for the estimation of plastic litter volume. This would help researchers, including individuals, and the the government to monitor or analyze beach litter and the corresponding pollution levels. ©

Value of the Data
• Marine plastic pollution is a major global issue nowadays. To address this problem, it is important to determine the amount of plastic litter in the natural environment for scientific research and management purposes. Automated image analysis techniques for identifying plastic litter are essential for quantification, but training datasets for machine learning must be first obtained to develop such technologies. • Accessible datasets for the segmentation of beach plastic litter are extremely rare. This dataset is unique in that it was created based on images depicting litter in its natural state, which makes it very practical. Furthermore, the dataset is valuable because its manually generated high-quality pixel-level annotations are more expensive than those created for image classification or bounding box-based object detection. • This dataset can be used to develop machine learning technologies to detect beach plastic litter at the pixel level, which has the potential to enable further scalability for the esti-mation of plastic litter volume. These technologies can assist researchers and local communities, including the local government, in monitoring and analyzing beach litter and pollution levels. • Additionally, this dataset serves as a benchmark for researchers to develop improved technologies for similar tasks. Moreover, depending on the users, this dataset can serve multiple purposes in different levels of technology development, from counting objects to estimating litter coverage, as it provides both bounding box-based and pixel-based annotations.

Objective
The quantification of beach litter is a fundamental procedure in understanding the seriousness of pollution on beaches and addressing related issues.   [2] developed a technique to identify artificial litter objects on beaches at the pixel level, but they were unable to identify plastic litter separately or count each object using the dataset [3] employed in the research. The Beach Plastic Litter Dataset version 1 (BePLi Dataset v1) enables the development of machine learning models that identify beach plastic litter at the instance or pixel levels. This BePLi Dataset v1 was created to facilitate the further advancement of the technique and to specialize in plastic litter quantity estimation, which have high demand from society.

BePLi Dataset v1
The BePLi Dataset v1 comprises 3709 original image of beach plastic litter and corresponding instance segmentation annotations that were manually processed ( Fig. 1 ). The images were taken at beaches along the entire coastline of Yamagata prefecture in Japan which is on the Japan Sea. The annotations for the object detection task were provided as JSON files compiled in the Microsoft Common Objects in Context (MS COCO) format (deliberately modified in part). All target objects were made of plastic and were categorized in a single object class called "plas-tic_litter". The BePLi Dataset v1 was provided within the. "plastic_coco" directory, which contains the original images for training, validation, and testing as well as the corresponding MS     set to "1", and the annotation was encoded as RLE encoding. However, because a polygon cannot express a hollowed-out object, such as a "tire", the BePLi Dataset v1 used RLE encoding for the instance segmentation of a single object. Moreover, the "iscrowd" entry was intentionally set to "1", and all the annotations were given for single objects. The total number of annotations on the BePLi Dataset v1 was 119192, and the average annotation for each image was 32.14. Fig. 3 shows the number of annotations in each image in the training, validation, and test dataset. The pixel size distribution of each annotation, for instance segmentation, is shown in Fig. 4 .

Experimental Design, Materials and Methods
The BePLi Dataset v1 was created using 3709 original beach images obtained from beaches located along the entire coast of Yamagata Prefecture, which is situated in the northern part of Japan (38 °14 26 N, 140 °21 48 E), with its coastline facing the Japan Sea, where a significant amount of litter drifts from marginal Asian countries. The images were obtained from beach litter monitoring programs that have been operated by the local government twice a year, during spring and autumn, since 2011, across 167 monitoring sites. The monitoring method, developed by the Non-Profit Organization Partnership Office, is visual-based, and details can be found in [4] . During the monitoring, observers recorded images of the beach and litter using standard consumer digital cameras from three (front, left, and right) or four (front, left, right, and back) directions and took extra close-up shots of the litter. The images were taken in different scenarios, such as sand beaches, rocky beaches, and tetrapods. All the images were pasted on mon-itoring records in Microsoft Excel files, from which 3709 images were extracted for the BePLi Dataset v1. The Excel files included locality information on where each image was taken, but this information cannot be made public due to an agreement between the local government and the dataset creators. To create the corresponding instance segmentation annotations for beach plastic litter, manual annotations were performed using Adobe Photoshop (Adobe Inc.), where segmentation was given for each single plastic litter object and saved as mask PNG files. The annotation was given for all objects made from plastics, such as PET bottles, containers, fishing gears, styrene foams, and fragmented plastics, and all annotations were classified under one category, "plastic_litter." The first manual annotations were performed by an outsourcing company, after which researchers, engineers, and students conducted data checks and quality control. The acceptance of errors for the manual annotations on instance segmentation was ±3 pixels. Lastly, JSON files formatted in MS COCO (Object Detection Task) were created from the mask PNG files using Python libraries, such as Pycocotools 2.0 and Pillow 8.4, and running codes uniquely developed for the task. The JSON files were deliberately modified in part, as described in the data description section.

Ethics Statements
Not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have influence the work reported in this paper.